fix(qwp): fix missed terminal send errors during Sender close by jerrinot · Pull Request #42 · questdb/java-questdb-client

jerrinot · 2026-06-09T12:46:22Z

Fixes a race where QwpWebSocketSender.close() could return successfully even after a terminal QWP/WebSocket error was latched during close, and a related corner where close() could rethrow a wrapped terminal error the user had already caught from flush().

implementation details for reviewers

Track the exact latched instance that checkError() has synchronously surfaced and make close() suppress only that pre-owned instance, taken as the single read getSynchronouslySurfacedError(). This avoids treating a freshly latched terminal as already owned when it appears between two separate latch reads. Guarded by CloseOwnershipRaceTest (probabilistic accessor race, near-certain catch per CI run, ~0.15s).
Simplify the latch so its invariants live in one place: recordFatal wraps non-LineSenderException causes once at latch time, so every rethrow delivers the same instance — close()'s identity suppression becomes correct in every state, which also fixes a pre-existing corner where drainOnClose() re-threw a fresh wrapper for an error the user already caught. checkUnsurfacedError() encapsulates the close() safety net and hasUnsurfacedError() becomes private, so the racy two-read snapshot is no longer expressible outside the class. The typed SenderError payload is derived from the latched LineSenderServerException instead of being tracked in a sibling field.
Rename the latch field lastError to terminalError (and getLastError() to getTerminalError()): the field is a write-once, first-writer-wins latch — transient failures reconnect upstream and never reach it — so "last" was misleading.
CloseSafetyNetTest pins both branches of close()'s safety net deterministically: a config-string-only caller who never flushed must see the latched terminal thrown from close(); a caller whose custom error handler already received it must get a silent close.

Fixes a race where QwpWebSocketSender.close() could return successfully even after a terminal QWP/WebSocket error was latched during close. **implementation details for reviewers** Track the exact `lastError` instance that `checkError()` has synchronously surfaced and make `QwpWebSocketSender.close()` suppress only that pre-owned instance. This avoids treating a freshly latched terminal error as already owned when it appears between separate `hasUnsurfacedError()` and `getLastError()` reads.

The latch leaked its invariants into callers: close() had to compose hasUnsurfacedError() + checkError() for the safety-net rethrow, and checkError() minted a fresh wrapper per call for raw causes, so drainOnClose() could rethrow an error the user had already caught. - recordFatal() wraps non-LineSenderException causes once, at latch time: every rethrow delivers the same instance, making close()'s identity suppression correct in every state. - checkUnsurfacedError() encapsulates the close() safety net; hasUnsurfacedError() becomes private, so the torn two-read ownership snapshot is no longer writable outside the class. - getLastTerminalServerError() derives the SenderError from the latched LineSenderServerException; the sibling field is gone.

No sender-level test asserted WHETHER close() throws: the existing close assertions (InitialConnectAsyncTest) install a handler and tolerate both outcomes, so deleting the safety net, inverting its handler gate, or always-suppressing the snapshot all survived the suite. CloseSafetyNetTest adds the strict pair, each awaiting the latched terminal deterministically before close(): - a config-string-only caller who never flushed must get the typed terminal thrown from close(); - a caller whose custom handler already received the error must get a silent close(). All three mutations above now fail the suite. The snapshot race itself stays covered by CloseOwnershipRaceTest at the accessor. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

lastError suggested most-recent-wins, but the field is a write-once first-writer-wins latch: transient failures reconnect upstream and never reach it; only the error that ends the loop is latched. The name now matches the documented invariant and the "latched terminal" language used throughout. recordFatal's javadoc also states the single-writer rule explicitly: the unsynchronized check-then-latch is only safe because every caller runs on the I/O thread. The retry-loop locals keep the lastError name — there it is accurate. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

mtopolnik · 2026-06-11T20:02:24Z

[PR Coverage check]

😍 pass : 24 / 28 (85.71%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/client/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoop.java	20	24	83.33%
🔵	io/questdb/client/cutlass/qwp/client/QwpWebSocketSender.java	4	4	100.00%

jerrinot added the bug Something isn't working label Jun 9, 2026

jerrinot and others added 7 commits June 11, 2026 14:15

Merge remote-tracking branch 'origin/main' into jh_close_error_race

adf9fbb

probabilistic test

e6088bd

Merge remote-tracking branch 'origin/main' into jh_close_error_race

55e64fa

style

7a719ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(qwp): fix missed terminal send errors during Sender close#42

fix(qwp): fix missed terminal send errors during Sender close#42
jerrinot wants to merge 8 commits into
mainfrom
jh_close_error_race

jerrinot commented Jun 9, 2026 •

edited

Loading

Uh oh!

mtopolnik commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jerrinot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mtopolnik commented Jun 11, 2026

[PR Coverage check]

file detail

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jerrinot commented Jun 9, 2026 •

edited

Loading